Genetic Programming, Validation Sets, and Parsimony Pressure

نویسندگان

  • Christian Gagné
  • Marc Schoenauer
  • Marc Parizeau
  • Marco Tomassini
چکیده

Fitness functions based on test cases are very common in Genetic Programming (GP). This process can be assimilated to a learning task, with the inference of models from a limited number of samples. This paper is an investigation on two methods to improve generalization in GP-based learning: 1) the selection of the best-of-run individuals using a three data sets methodology, and 2) the application of parsimony pressure in order to reduce the complexity of the solutions. Results using GP in a binary classification setup show that while the accuracy on the test sets is preserved, with less variances compared to baseline results, the mean tree size obtained with the tested methods is significantly reduced. This paper is an experimental study of methodologies for Evolutionary Computations (EC) inspired by common practices in the Machine Learning (ML) and Pattern Recognition (PR) communities. More specifically, using Genetic Programming (GP) for supervised learning, we aim at evaluating both the effect of using a three data sets methodology (training, validation, and test sets) and the effect of minimizing the classifiers complexity. Our experiments show that these approaches preserve the performances of GP, while significantly reducing the size of the best-of-run solutions, which is in accordance with Occam’s Razor principle. The structure of the paper goes as follow. Section 1 starts with a high-level description of the tested approaches and their justifications. A presentation of relevant work follows in Section 2. Thereafter, the methodology used in the experiments is detailed in Section 3. Finally, Section 4 presents the experimental results obtained on six binary classification data sets, and Section 5 concludes the paper.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effects of Code Growth and Parsimony Pressure on Populations in Genetic Programming

Parsimony pressure, the explicit penalization of larger programs, has been increasingly used as a means of controlling code growth in genetic programming. However, in many cases parsimony pressure degrades the performance of the genetic program. In this paper we show that poor average results with parsimony pressure are a result of 'failed' populations that overshadow the results of populations...

متن کامل

E ects of Code Growth and ParsimonyPressure on Populations in GeneticProgramming

Parsimony pressure, the explicit penalization of larger programs, has been increasingly used as a means of controlling code growth in genetic programming. However, in many cases parsimony pressure degrades the performance of the genetic program. In this paper we show that poor average results with parsimony pressure are a result of \failed" populations which overshadow the results of population...

متن کامل

E ects of Code Growth and ParsimonyPressure on Populations in GeneticProgramming Terence

Parsimony pressure has been increasingly used as a means of controling code growth in genetic programming. However, several published papers have shown that in some cases its use can degrade the performance of the genetic program Koza, 1992, Nordin and Banzhaf, 1995]. In this paper we show that poor average results with parsimony pressure are a result of \failed" populations which overshadow th...

متن کامل

Lexicographic Parsimony Pressure

We introduce a technique called lexicographic parsimony pressure, for controlling the significant growth of genetic programming trees during the course of an evolutionary computation run. Lexicographic parsimony pressure modifies selection to prefer smaller trees only when fitnesses are equal (or equal in rank). This technique is simple to implement and is not affected by specific differences i...

متن کامل

Covariant Parsimony Pressure for Genetic Programming

The parsimony pressure method is perhaps the simplest and most frequently used method to control bloat in genetic programming. In this paper we first reconsider the size evolution equation for genetic programming developed in [24] and rewrite it in a form that shows its direct relationship to Price’s theorem. We then use this new formulation to derive theoretical results that show how to practi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006